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Abstract 

In this paper, we introduce a new way of constructing and decoding multipermutation codes. 
Multipermutations are permutations of a multiset that generally consist of duplicate entries. 
We first introduce a class of binary matrices called multipermutation matrices, each of which 
corresponds to a unique and distinct multipermutation. By enforcing a set of linear constraints 
on these matrices, we define a new class of codes that we term LP-decodable multipermutation 
codes. In order to decode these codes using a linear program (LP), thereby enabling soft 
decoding, we characterize the convex hull of multipermutation matrices. This characterization 
allows us to relax the coding constraints to a polytope and to derive two LP decoding problems. 
These two problems are respectively formulated by relaxing the maximum likelihood decoding 
problem and the minimum Chebyshev distance decoding problem. 

Because these codes are non-linear, we also study efficient encoding and decoding algorithms. 
We first describe an algorithm that maps consecutive integers, one by one, to an ordered list 
of multipermutations. Based on this algorithm, we develop an encoding algorithm for a code 
proposed by Shieh and Tsai, a code that falls into our class of LP-decodable multipermutation 
codes. Regarding decoding algorithms, we propose an efficient distributed decoding algorithm 
based on the alternating direction method of multipliers (ADMM). Finally, we observe from 
simulation results that the soft decoding techniques we introduce can significantly outperform 
hard decoding techniques that are based on quantized channel outputs. 


1 Introduction 

Using permutations and multipermutations in communication systems dates back at least to [T], 
where Slepian considered using multipermutations in a data transmission scheme for the additive 
white Gaussian noise (AWGN) channel. In recent years, there has been growing interest in per¬ 
mutation codes due to their usefulness in various applications such as powerline communications 
(PLC) [2j and flash memories [3]. For PLC, permutation codes are proposed to deal with permanent 
narrow-band noise and impulse noise while delivering constant transmission power (see also a)- 
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In flash memories, information is stored in the pattern of charge levels of memory cells. Jiang et al. 
proposed using the relative ranking of memory cells to modulate information [3]. This approach 
alleviates the over-injection problem during cell programming. In addition, it can reduce errors 
caused by charge leakage (cf. [3]). 

Error correction codes that use permutations are usually designed using on a specific distance 
metric over permutations. In the context of rank modulation, the commonly considered distance 
metrics include the Kendall tau distance (e.g., [MD]), the Chebyshev distance (e.g., [ZUSUHHH]), 
and the Ularn distance (e.g., [E]). Regardless of the choice of distance metric, these studies all 
consider hard decoding algorithms. In other words, the objective of each decoder is to correct 
some number of errors in the corresponding distance metric. In order to bring soft decoding 
to permutation codes, Wadayama and Hagiwara introduce linear programming (LP) decoding of 
permutation codes in m- Although the set of codes that can be decoded by LP decoding is 
restrictive, the framework is promising for two reasons. First, the algorithm is soft-in soft-out, 
differentiating itself from hard decoding algorithms based on quantized rankings of channel outputs; 
and would therefore be expected to achieve lower error rates. Second, the algorithm is based 
on solving an optimization problem, which makes it possible to incorporate future advances in 
optimization techniques. 

In this paper, we extend the idea in m to multipermutations. Multipermutations generalize 
permutations by allowing multiple entries of the same value: a multipermutation a permutation of 
the multiset {1, 1,... ,1,2,... ,2,... ,m,, m}. The number of entries of value i in the multiset 
is called the multiplicity of i. We denote by r = (n,..., r rn ) the multiplicity vector of a multiset, 
where is the multiplicity of i; in other words, r is the histogram of the multiset. Furthermore, a 
multipermutation is called r-regular if r = r, for all i. A multipermutation code can be obtained 
by selecting a subset of all multipermutations of the multiset. In the literature, multipermutation 
codes are referred as constant-composition codes when the Hamming distance is considered m- 
When r\ = r2 = ■ ■ ■ = r m , the multipermutations under consideration are known as frequency 
permutation arrays [L8j. Recently, multipermutation codes under the Kendall tau distance and 
the Ulam distance are studied in [T9] and m respectively. As mentioned in [20], there are two 
motivations for using multipermutation codes in rank modulation. First, the size of a codebook 
based on multipermutations can be larger than that based on permutations (i.e., those in |2lj). 
Second, the number of distinct charges a flash memory can store is limited by the physical resolution 
of the hardware and thus using permutations over large alphabets is impractical. 

In fact, the construction of Wadayama and Hagiwara in m is defined over multipermutations. 
However in m , multipermutations are described using permutation matrices. This results in 
two notable issues. First, since the size of a permutation matrix scales quadratically with the 
length of the corresponding vector, the number of variables needed to specify a multipermutation 
in this representation scales quadratically with the length of the multipermutation. But since 
multipermutations consist of many replicated entries, one does not need to describe the relative 
positions among entries of the same value. This intuition suggests that one can use fewer variables 
to represent multipermutations. 

The second issue relates to code non-singularity as defined in fl6] . To elaborate, we briefly 
review some concepts therein. In m, a codebook is obtained by permuting an initial (row) vector 
s with a set of permutation matrices. If s contains duplicate entries, then there exists at least 
two different permutation matrices P\ and P 2 such that sP\ = sPj. This means that we cannot 
differentiate between sP\ and SP 2 by comparing the matrices P\ and P 2 . Due to this ambiguity, 
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permutation matrices are not perfect proxies to multipermutations. To see this, note that the car¬ 
dinality of a multipermutation code is not necessarily equal to the cardinality of the corresponding 
permutation matrices. This makes it not straightforward to calculate codebook cardinality from 
the set of permutation matrices. Furthermore, minimizing the Hamming distance between two mul¬ 
tipermutations is not equivalent to minimizing the Hamming distance between two permutation 
matriceifl. This can be seen easily from the example above, where the Hamming distance between 
sPi and sP ‘2 is zero, but the Hamming distance between Pi and P 2 is greater than zero. 

In this paper, we address the above two problems by introducing the concept of multipermuta¬ 
tion matrices. Multipermutation matrices and multipermutations are in a one-to-one relationship. 
In comparison to a permutation matrix, a multipermutation matrix is a more compact represen¬ 
tations of a multipermutation. Further, due to the one-to-one relationship, we can calculate the 
cardinality of a multipermutation code by calculating the cardinality of the associated multipermu¬ 
tation matrices. In order to construct codes that can be decoded using LP decoding, we develop 
a simple characterization of the convex hull of multipermutation matrices. The characterization 
is analogous to the well known Birkhoff polytope (cf. [22]). These results form the basis for the 
code constructions that follow. They may also be of independent interests to the optimization 
community. We consider the introduction of multipermutation matrices and the characterization 
of their convex hull to be our first set of contributions. 

Building on these results, our second set of contributions include code definitions and decod¬ 
ing problem formulations. By placing linear constraints on multipermutation matrices to select 
a subset of multipermutations, we form codebooks that we term LP-decodable multipermutation 
codes. Along this thread, we first present a simple and novel description of a code introduced by 
Shieh and Tsai in |23j (ST codes) that has known rate and distance properties. Then, we study 
two random coding ensembles and derive their size and distance properties. The code definitions 
using multipermutation matrices immediately imply an LP decoding formulation. We first relax 
the maximum likelihood (ML) decoding problem to form an LP decoding problem for arbitrary 
memoryless channels. In particular, for the AWGN channel, our formulation is equivalent to the 
decoding problem proposed in [16]. We then relax the minimum (Chebyshev) distance decoding 
problem to derive an LP decoding scheme that minimizes the Chebyshev distance in a relaxed code 
poly tope. 

Due to the non-linearity of multipermutation codes, we need efficient encoding and decoding 
algorithms, which brings us to our third set of contributions. To the best of the our knowledge, 
there has been no encoding algorithms for the ST codes that were introduced in [23]. Therefore, we 


(V 77 ! r-V 

first focus on encoding and introduce an algorithm that ranks all N = yp/r 1 , V multipermutations 
that are parameterized by the multiplicity vector r. In other words, suppose all multipermutations 
are ranked such that each corresponds to an index in {0 ,... ,N — 1}. Our algorithm outputs the 
index that corresponds to a given input multipermutation. We use this ranking algorithm as the 
basis for developing of a low-complexity encoding algorithm for the ST codes. Next, we develop 
an efficient decoding algorithm based on the alternating direction method of multipliers (ADMM), 
which has recently been used to develop efficient decoding algorithms for linear codes (e.g, I24H28I ). 
The ADMM decoding algorithm in this paper requires two subroutines that perform Euclidean 
projections onto two distinct polytopes. Both projections can be solved efficiently, the first using 
techniques proposed in [29] and the second using the algorithm developed in Appendix fTOl 
We list below our contributions in this paper. 


1 This is defined as the number of disagreeing entries (cf. Section (3j- 
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• We introduce the concept of multipermutation matrices (Section 13.11) and characterize the 
convex hull of all multipermutation matrices (Section 13.21) . 

• We propose LP-decoding multipermutation codes (Section 14. ID and redefine a code introduced 
by Shieh and Tsai (ST codes) using our framework (Section 14. 2D . Furthermore, we study two 
random coding ensembles and compare ST codes with the ensemble average (Section 14.31 and 
Appendix [9]) . 

• We formulate two LP decoding problems (Section [5]), one for maximizing the likelihood, and 
the other for minimizing the Chebyshev distance. 

• We derive an efficient encoding algorithm for ST codes (Section 16.ID . 

• We develop an ADMM decoding algorithm for solving the LP decoding problem for memo¬ 
ryless channels (Section I6.2D . 

• We initiate the study of initial vector estimation problem for rank modulation and propose a 
turbo-equalization like decoding algorithm (Appendix fill) . 


2 Preliminaries 

In this section, we briefly review the concept of permutation matrices and the code construction 
approach proposed in [16] . 

A length-n permutation n is a length-n vector, each element of which is a distinct integer 
between 1 and n, inclusive. Every permutation corresponds to a unique n x n permutation matrix, 
a permutation matrix is a binary matrix such that every row or column sum equals to 1. In this 
paper, all permutations (and multipermutations) are represented using row vectors. Thus, if P is 
the permutation matrix corresponding to the permutation n, then n = iP where i = (1,2,... ,n) 
is the identity permutation. We denote by II„ the set of all permutation matrices of size n x n. 

Definition 1 (cf. \l$j ) Let K and n be positive integers. Assume that A £ Z Kxn2 , h € r L K , and 
let “< ” represent a vector of “< ” or “= ” relations. A set of linearly constrained permutation 
matrices is defined by 

n(A,6,<) := {P G n„|Avec(P) < b}, (2.1) 

where vec(-) is the operation of concatenating all columns of a matrix to form a column vector. 

Definition 2 (cf. 116$) Assume the same set up as in Definition QJ Suppose also that a row vector 
s € is given. The set of vectors A (A, b, <, s) given by 

A (A, b, <, s) := {sP € R n \P £ II(A, b, <)} (2.2) 

is called an LP-decodable permutation codJl. s is called the “initial vector”, which is assumed to 
be known by both the encoder and the decoder. 

2 In this paper, we always let the initial vector be a row vector. Then sP is again a row vector. This is different 
from the notation followed by m, where the authors consider column vectors. 
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Note that s may contain duplicates, and can be a permutation of the following vector 

S = . ■ , t\ , t2,t2,y, < 2 , ■ • • , tm i tm ? • • • ? t"rn ), ( 2 - 3 ) 

r l r-2 r m 

where ti ^ tj for i j and there are r* entries with value t,;. In this paper, we denote by r = 
(ri,..., r m ) the multiplicity vector, and let i := (ti, ^ 2 , ■ ■ ■, t m ). Due to this notation, Y^IL i r * = n - 
Throughout the paper, we use t to represent a vector with distinct entries. At this point, it is 
easy to observe that the vector s can be uniquely determined by t and r. Therefore, in this paper, 
we refer t as the “initial vector” instead of s. 

As an important remark, we note that the initial vector does not have to be a vector of integers 
(s and t are in the reals). One can think of initial vectors as the actual charge levels of a flash 
memory programmed using the rank modulation scheme. For most of this paper, we assume that t 
is fixed and known to the decoder. However, this assumption does not necessarily hold in practice. 
In Appendix II 11 we briefly discuss some initial work that considers what to do when t is unknown 
to the decoder. 

3 Multipermutation matrices 

In this section, we introduce the concept of multipermutation matrices. Although, as in ( 12 . 211 . we 
can obtain a multipermutation of length n by multiplying an length-?! initial vector by an n x n 
permutation matrix, this mapping is not one-to-one. Thus |A(A, b , < 1 , s)| < |n(A, b, <)|, where the 
inequality can be strict if there is at least one i such that r t >2. As a motivating example, let a 
multipermutation be x = (1, 2,1, 2). Consider the following two permutation matrices 


A 

0 

0 



(0 

0 

1 

°\ 

0 

0 

1 

0 

and P‘2 = 

1 

0 

0 

0 

0 

1 

0 

0 


0 

0 

0 

1 

\o 

0 

0 

V 


\0 

1 

0 

0 / 


Then x = sP\ = SP 2 , where s = (1,1,2, 2). In fact there are a total of four permutation 
matrices that can produce x. 

To resolve this ambiguity, we now introduce multipermutation matrices, which are defined to be 
rectangular binary matrices parameterized by a multiplicity vector. Then, we discuss the advantages 
of using multipermutation matrices vis-a-vis permutation matrices. Finally, we show a theorem that 
characterizes the convex hull of multipermutation matrices, a theorem that is crucial to our code 
constructions and decoding algorithms. 

3.1 Introducing multipermutation matrices 

Recall that r is a multiplicity vector of length m and n := J2i=i r i ■ We denote by M(r) the set of 
all distinct multipermutations parameterized by the multiplicity vector r. We now define a set of 
binary matrices that is in a one-to-one correspondence with M(r). 

Definition 3 Given a multiplicity vector r of length m and n = YliLi r i> we ca H a 171 x n binary 
matrix X a multipermutation matrix parameterized by r ifY^ILi W ij = 1 for all j and X7/=i = 

?'j for all i. Denote by M(r) the set of all multipermutation matrices parameterized by r. 
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Using this definition, it is easy to build a bijective mapping between multipermutations and 
multipermutation matrices. When the initial vector is t, the mapping M(r) i— > M(r ) can be 
defined as follows: Let x denote a multipermutation. Then, it is uniquely represented by the 
multipermutation matrix X such that Xjj = 1 if and only if Xj = t{. Conversely, to obtain the 
multipermutation x, one can simply calculate the product tX. 

Example 1 Let the multiplicity vector her = (2,3, 2,3), let t = (1, 2,3,4), and let x = (2,1,4,1, 2,3,4, 4, 2,3). 
Then the corresponding multipermutation matrix is 

/0 1 0 1 0 0 0 0 0 0\ 

1000100010 
0000010001 ' 

\0 0 1 0 0 0 1 1 0 0/ 

The row sums of X are (2,3,2,3) respectively. Further, x = tX. 

Lemma 1 Let t he an initial vector of length m with m distinct entries. Let X and Y be two 
multipermutation matrices parameterized by a multiplicity vector r. Further, let x = tX and 
y = tY . Then x = y if and only if X = Y. 

Proof First, it is obvious that if X = Y then x = y. Next, we show that if I/Y then x ^ y. 

We prove by contradiction. Assume that there exists two multipermutation matrices X and Y 
such that I/L and x = y. Then 


x — y = t{X — Y). 


Since X j- Y, there exists at least one column j such that X has a 1 at the k -th row and Y has a 
1 at the l -th row where k ^ l. Then the j-th entry of t(X — Y) would be t^ — f; ^ 0 because all 
entries of t are different. This contradicts the assumption that x — y = 0. ■ 

At this point, one may wonder why this one-to-one relationship matters. We now discuss three 
aspects in which having a one-to-one relationship between multipermutations and multipermutation 
matrices is beneficial. 


3.1.1 Reduction in the number of variables 

One immediate advantage of using multipermutation matrices is that they require fewer variables 
to represent multipermutations. (This was the first issue discussed in the Introduction.) The 
multipermutation matrix corresponding to a length-n multipermutation has size m x n, where m 
is the number of distinct values in the multipermutation. 

This benefit can be significant when the multiplicities are large, i.e., when m is much smaller 
than n. For example, a triple level cell flash memory has 8 states per cell. If a multipermutation 
code is of length 100, then one needs an 8 x 100 multipermutation matrix to represent a codeword. 
The corresponding permutation matrix has size 100 x 100. 
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3.1.2 Analyzing codes via binary matrices 

Also due to the one-to-one relationship, one can use multipermutation matrices as a proxy for 
multipermutations. By this we mean that one can analyze properties of multipermutation codes 
by analyzing the associating multipermutation matrices. First, note that a set of multipermutation 
matrices of cardinality M can be mapped to a set of multipermutations of cardinality M. This 
means that one can determine the size of a multipermutation code by characterizing the cardinality 
of the corresponding set of multipermutation matrices. As an example, in Section [4.31 we analyze 
the average cardinality of two random coding ensembles. Second, one can determine distance 
properties of a multipermutation code via the set of multipermutation matrices. One such example 
is the Hamming distance, which is demonstrated in details in the following. 


3.1.3 The Hamming distance 

The Hamming distance between two multipermutations is defined as the number of entries in which 
the two vectors differ from each other. More formally, let x and y be two multipermutations, then 
dn(x,y) = \{i\xi yi\\- Due to Lemma [Q we can express the Hamming distance between two 
multipermutations using their corresponding multipermutation matrices. 

Lemma 2 Let X and Y be two multipermutation matrices, and let t be an initial vector with 
distinct entries. With a small abuse of notations, denote by djj(X,Y) the Hamming distance 
between the two matrices, which is defined by dn(X,Y) := \{(i,j)\Xjj Y^}|. Then 

dH(X, Y) = 2 d H (x, y), 


where x = tX and y = tY. Furthermore, 

d H (X,Y) = tr(X T (E-Y)), 

where tr(-) represents the trace of the matrix and E is an n x n matrix with all entries equal to 1. 

0 

Proof For all j such that Xj yj, the j-th column of X differs from the j-th column of T by 
two entries. As a result, the distance between multipermutation matrices is double the distance 
between the corresponding multipermutations. 

Next, tr {X t (E - Y)) = ~ Y ij)- If x *j = Y ij then X n( l ~ Y ij) = Otherwise 

Xij( 1 - Y^) = 1. Therefore d H (X, Y) = tr {X T (E - Y)). U 

The above two points relate to the second issue regarding the redundancy in the representation 
of [16], as was discussed in the Introduction. 


3.2 Geometry of multipermutation matrices 

In this section we prove an important theorem that characterizes the convex hull of all multiper¬ 
mutation matrices. As background, we first review the definition of doubly stochastic matrices. 
Then, we state the Birkhoff-von Neumann theorem for permutation matrices. Finally, we build off 
the Birkhoff-von Neumann theorem to prove our theorem. We refer readers to [22] and references 
therein for more materials on doubly stochastic matrices and related topics. 

3 We note that tr(X T y) = JT . X\j Yij is the Frobenius inner product of two equal-sized matrices. 
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Definition 4 An n x n matrix Q is doubly stochastic if 

(a) Qij > 0; 

(b) Ya =i Qij = 1 f or al1 J and YTj =i Qij = 1 f or all i- 

The set of all doubly stochastic matrices is called the Birkhoff polytope. There is a close 
relationship between the Birkhoff polytope and the set of permutation matrices as the following 
theorem formalizes: 

Theorem 1 (Birkhoff-von Neumann Theorem, cf. \2 6 Aj ) The permutation matrices constitute the 
extreme points of the set of doubly stochastic matrices. Moreover, the set of doubly stochastic 
matrices is the convex hull of the permutation matrices. 

This theorem is the basis for the decoding problem formulated in m- Namely, the LP relaxation 
for codes defined by Definition [2] is based on the Birkhoff polytope. In order to formulate LP 
decoding problems using multipermutation matrices, we need a similar theorem that characterizes 
the convex hull of multipermutation matrices. 

Denote by M(r) the convex hull of all multipermutation matrices parameterized by r, i.e. 
M(r) = conv(.M(r)). Then, M(r) is characterized by the following theorem. 

Theorem 2 Let r £ ZT and Z be an m X n matrix such that 

{a) YJiLi z ij = 1 f° r all j = !,■■■, n. 

(b) Y!j=i z ij = r i f or all i = 1,... ,m. 

(c) £ [0,1] for all i and j. 

Then, Z is a convex combination of all multipermutation matrices parameterized by r. Conversely, 
any convex combination of multipermutation matrices parameterized by r satisfies the above condi¬ 
tions. 

Proof Consider a multipermutation x = (xi,... ,x n ) where each Xk £ {1,..., m}. Without loss 
of generality, we assume that x is in increasing order. We denote by the index set for the i-th 
symbol, i.e., 

:= + f • t 3 - 1 ) 

12=1 2=1 J 

Then Xk = i if k £ X,. Let X be the corresponding m x n multipermutation matrix. Then X has 
the following form 

/L4 0 

ri 


1 ... 1 



where Wfc = 1 if k £ X, and X ? :fc = 0 otherwise. 




Note that all multipermutation matrices parameterized by a fixed r are column permutations 
of each other. Of course, as already pointed out, not all distinct permutations of columns yield 
distinct multipermutation matrices. To show that any Z satisfying (a)-(c) is a convex combination 
of multipermutation matrices, we show that there exists an n x n stochastic matrix Q such that 
Z = XQ. Then by Theorem [Q Q can be expressed as a convex combination of permutation 
matrices. In other words, Q = cthPh where Ph € II„ are permutation matrices; ah > 0 for all 
h and 0 ^ = 1. Then we have 

O-hPh = ^a h (XP h ), 

h h 

where XPh is a column permuted version of the matrix X , which is a multipermutation matrix of 
multiplicity r. This implies that Z is a convex combination of multipermutation matrices. 

We construct the required n x n matrix Q in the following way. For each j£ (1,2,..., m), let 
q* be a length-n column vector, q\ = for j = 1,... ,n. Then the n x n matrix 

Q r ■■= [gV|...| .■.W\q i \...\...\q m W^...]. (3.2) 

ri of them ri of them r m of them 

In other words, Qkj = -p- Zj 3 for all k £Zi and j = 1,..., n. We now verify that Z = XQ and that 
Q is doubly stochastic, which by our discussions above implies that Z is a convex combination of 
column-wise permutations of X. 

1. To verify Z = XQ, we need to show that Zij = Hk=i X ikQkj- Since X is a binary matrix, 

n 

X ikQkj = E Qkj • 

k=l k:Xik =1 

In addition, since x is sorted, = 1 if and only if k G 2^. By the definition of Q, Q\~j = yZij 
for all k Eli. Therefore 

E Qk j =r i X = Zij . 

k-.Xik =1 

2. Next we verify that Q is a double stochastic matrix. Since 0 < Zij < 1 for all i,j, Qij > 0 
for all i,j. By the definition of Q, the sum of each row is ||q*||i for some i. Thus ||q'||i = 
Xj=i ~k^ij = 1 by condition ( b ). The sum of each column is 

n m 

E = E E 

7—1 AiEXj 

m 1 m 

=EE- Z «= E z « = 1 . 

i=l fceXi * »=1 

where the last equality is due to condition (o). 

To summarize, for any given real matrix Z satisfying condition (a)-(c) we can find a doubly stochas¬ 
tic matrix Q such that Z = XQ for a particular multipermutation matrix X. This implies that 
Z is a convex combination of multipermutation matrices. 

The converse is easy to verify by the definition of convex combinations and therefore is omitted. 


9 


4 LP-decodable multipermutation code 

4.1 Constructing codes using linearly constrained multipermutation matrices 

Using multipermutation matrices as defined in Definition [3l we define the set of linearly constrained 
multipermutation matrices analogous to that in euS 

Definition 5 Letr be a length-m multiplicity vector, andn := 1 n. Let K be a positive integer. 

Assume that A £ Z Kx ^ mn '>, b € Z R , and let represent a vector of “<” or “=” relations. A 
set of linearly constrained multipermutation matrices is defined as 

n M (r, A,b,<) := {X £ M{r)\Avec(X) < b}, (4.1) 

where A i(r) is the set of all multipermutation matrices parameterized by r. 

Definition 6 Let r be a length-m multiplicity vector, and n := YllLi r i- Let K be a positive 
integer. Assume that A £ Z A x(H j £ Z K , and let represent a vector of “<” or “=” 
relations. Suppose also that t £ is given. The set of vectors A M (r, A, b, <3,i) given by 

A M (r, A, b, <,t) := {tX £ R n |X € U M (r,A,b,<)} (4.2) 

is called an LP-decodable multipermutation code. 

We can relax the integer constraints and form a code polytope. Recall that M(r) is the convex 
hull of all multipermutation matrices parameterized by r. 

Definition 7 The polytope P M (r, A, b, <) defined by 

P M (r,A,b,<) := M(r)p|{X € M mxrt |A vec(X) < b} 

is called the “code polytope”. We note that P M (r, A, b , <3) is a polytope because it is the intersection 
of two polytopes. 

Regarding the above definitions, we discuss some key ingredients. 

• Definition [5] defines the set of multipermutation matrices. Due to Lemma [H this set uniquely 
determines a set of multipermutations. The actual codeword that is transmitted (or stored in 
a memory system) is also determined by the initial vector t, which depends on the modulation 
scheme used in the system. Definition [6] is the set of codewords determined by II M (r, A , b , <) 
once the initial vector t is into account. 

• Definition [T] is useful for decoding. It will be discussed in detail in Section [5j As a preview, in 
Section [5l we will formulate two optimization problems with variables constrained by the code 
polytope P M (r, A, b, <). In both optimizations, the objective functions will be related to the 
initial vector t but the constraints will only be a function of P M (r, A,b, <1). We emphasize 
that P M (r, A, b , <) is not parameterized by t. 

4 The analogy is in the following sense. One can obtain the definitions in this section by restating the definition 
in [16] using multipermutations and the convex hull M(r).s 
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• P M (r, A, b , <1) is defined as the intersection of two polytopes. It is not defined as the convex 
hull of A M (r, A, 6, <l,i), which is usually hard to describe. However, the intersection that 
define P M (r, A, b , <)) may introduce fractional vertices, i.e., vertices X G M mxn such that 
Xij G (0,1). Because of this, we call P M (r, A, b, <) a relaxation of conv(A M (r, A, b, <1, £)). 

To better explore structures of LP-decodable multi-permutation codes, we now define two spe¬ 
cific types of linear constraints. 

Definition 8 Fixed-at-zero constraints: Let Z be a set of entries ( i,j ). A code with a set of 
fixed-at-zero constraints is defined by both X G Ai(r) and X^ = 0 for all ( i,j ) G Z. Fixed-at- 
equality constraints: Let £ be a set of entry pairs ( i,j ), (k, l). A code with a set of fixed-at-equality 
constraint is defined by both X G M.(r) and X t j = X^i for all ( i,j ), (k, l) G £. These two types of 
constraints can be combined. 

We consider these two types of constraints because they are useful to define LP-decodable 
multipermutation and permutation codes, and because we can develop efficient decoding algorithms 
for these codes. For example, the pure “involution” code introduced in m is constructed by 
combining both constraints. In Section 14.21 we discuss two codes constructed using fixed-at-zero 
constraints. In Section H~3l we show random coding results for these two types of codes. Last but 
not least, we show how to decode codes with fixed-at-zero, fixed-at-equality, or both constraints 
using ADMM in Section 16.21 

Remarks 

A natural question to ask is whether the restriction to linear constraints reduces the space of 
possible code designs. In our previous paper [30], we show that the answer to this question is 
“No.” This follows because it is possible to define an arbitrary codebook using linear constraints. 
As we show more formally in WL one can add one linear constraint for each non-codeword, where 
the linear constraint requires that the Hamming distance between any codeword and that non¬ 
codeword to be at least 1. However, this approach leads to an exponential growth in the number of 
linear constraints. Thus, the interesting and challenging question is how to construct good codes 
(in terms of rate and error performance) that can be described efficiently using linear constraints. 
A related question that we study in [30] connects the description of LP-decodable permutation 
code (as defined in Definition [2]) to that of LP-decodable multipermutation code (as defined in 
Definition [6]). We show that codes described by Definition [6] can be restated using Definition [2] 
using the same number of linear constraints (the same “A'” in Definition [2] and [6]). We refer readers 
to m for the details of these results. 

4.2 Examples of LP-decodable multipermutation codes 

We provide two examples of codes using Definition [6j 

Example 2 (Derangement) A permutation ir is termed a “derangement” if Hi i for all i G 
{1,... , n}. For multipermutations, we define a generalized notion of derangement as follows. Let 

i = (1,1,... ,1,2,2,... ,2,... ,m,m, ...,m). 

'-V-' '-v-' '-v-' 

T\ T2 Tm 
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Let x be a multipermutation obtained by permuting i. We say that x is a derangement if Xi f %i 
for all i. 

In m, the authors use Definition [1 to define the set of derangements by letting tr(P) = 0, 
where P is a permutation matrix. We now extend this construction using Definition @ and let the 
linear constraints on the multipermutation matrix X be 

Xij = 0 for all i = 1,..., m, (4-3) 

j&L 


where T t is defined by (HU). Suppose the initial vector t = (1,2 then (|4.3I) implies that 
symbol i cannot appear at positions 2). For example, let t = (1,2,3) and r = (2,2,2). Then the 
allowed derangements that form the codebook are 

(3.3.1.1.2.2) ,(2,2,3,3,1,1),(2,3,1,3,2.1), 

(2.3.1.3.1.2) ,(2,3,3,1,2,1),(2,3,3,1,1,2), 

(3,2,1,3,2,1),(3,2,1,3,1,2),(3,2,3,1,2,1), 

(3, 2, 3,1,1, 2). 

Example 3 In WH/ . Shieh and Tsai study multipermutation codes under the Chebyshev distance. 
The Chebyshev distance between two multipermutations x and y is defined as 

doo(x,y) = max|xi -yi\. (4.4) 

i 

We review the Shieh-Tsai (ST) code (cf. 123 1 Construction 1]) in Definition 0 

Definition 9 Let r = (r, r,..., r) be a length-m vector. Let d be an integer such that d divides m. 
We define 

C(r, m, d) = {x G M(r)|V* G {1,..., mr}, Xi = i mod d}. (4.5) 

Although not originally presented that way in \2Sf . it is easy to verify that this code is an LP- 
decodable multipermutation code defined by fixed-at-zero constraints. The fixed-at-zero constraints 
are defined by the set Z = {(i,j)\j = 1,..., n and i ^ j mod d}. As a concrete example, let m = 6, 
r = 2 and d = 3. Then the constraints are 

*21 = *31 = *51 = *61 = 0 
*12 = *32 = *42 = *62 = 0 


* 1,12 — * 2,12 — * 4,12 — * 5,12 — 0 . 

It is showed in that this code has cardinality (|^4) d where a = m/d. Further, the minimum 
Chebyshev distance of this code is d. In addition, for large values ofr, the rate of the code is observed 
to be close to a theoretical upper bound on all codes of Chebyshev distance d. However, no encoding 
or decoding algorithms are presented in l23j . We discuss encoding and decoding algorithms for this 
code in Section 0 


12 



4.3 The random coding ensemble 

In this subsection, we study randomly constructed LP-decodable multipermutation codes. We focus 
on the ensembles generated either by fixed-at-zero constraints or by fixed-at-equality constraints. 
The randomness comes from choosing the respective constraint sets, Z or £, uniformly at random. 
Unfortunately, as we show in Appendix 19.21 several results indicate that the ensemble average is 
not as good as ST codes, which are structured codes belonging to the ensemble. Therefore, we 
only briefly present our problem formulations and results in the main text and refer readers to 
Appendix [9] for more details. 

We first introduce some additional notation. With a small abuse of notation, we denote by 
II M (r,Z) the set of multipermutation matrices constrained by fixed-at-zero constraints. Similarly, 
denote by II M (/■,£) the set of multipermutation matrices constrained by fixed-at-equality con¬ 
straints. Denote by n and l the respective cardinalities of sets Z and £. Furthermore, denote by 
S z {n) the set of all possible choices of Z that have n elements; denote by <S e (t) the set of all possible 
choices of £ that have i elements. 

Note that we do not consider duplicated constraints. In other words, all entries in £ (or Z) are 
distinct from each other. This is different from the set up in [151 Sec. VI], where the authors allow 
repeated constraints. Consequently, the cardinalities of both types of constraints, i.e., k and i, are 
limited. For example, since there are mn — n zeros in a multipermutation matrix, k should be less 
than or equal to mn — n: otherwise the cardinality of the code must be zercH. On the other hand, 
fixed-at-equality constraints are constructed by entry pairs. There are ( n ™) ways of choosing two 
entries from a multipermutation matrix. Therefore t < • 


Lemma 3 


|S*(«)| = 


nm 


U = 


_ i (?) 


Next, we draw Z (resp. £) uniformly at random from the set S z (n) (resp. S e (i)). As a result, 
for particular realizations Z and £, P{Z) = and P(£) = When taking into account 

all possible choices, we can show the following lemma for multipermutation matrices. 


Lemma 4 Consider a fixed multiplicity vector r and a fixed multipermutation matrix X E M(r). 
Let k and l be fixed parameters, then 


\{z es z (n)\x en M (r,z)}\ 



Similarly, 

/ fnm—n\ , /n\\ 

|{£€<S e (i)|Xen M (r*,£)}|= 2 ' [ 2 j j- 

Proof See Appendix 19. 1.1 1 ■ 

Following the methodology adopted in m, we prove Proposition Q] which calculates the average 
cardinality of multipermutation matrices that meets a randomly chosen set of either fixed-at-zero 
or fixed-at-equality constraints. Note that due to Lemma [H Proposition |T] actually calculates the 
codebook size. 

5 The cardinality of a code may be zero even when k is small, e.g., when Z fixes a whole column to zero. But for 
k > mn — n distinct constraints, the code size is zero regardless how we pick Z. 
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Proposition 1 Denote by A(II M (r, Z)) the cardinality of the code. Then, 


E[A(U M (r,Z))] 


|S‘(«)| 


where the expectation is taken over all possible choices of Z € S z (k). 


Using the same notation, 


E[A(II M (r,£))] 


, (nm — n\ . (n \. 

(( 2 / + ^ 2 ^)|7W(r)| 

MOl 


where the expectation is taken over all possible choices of 8 € S e {i). Recall that \M.(r)\ = • 

Further, |<S e (i)| and |5 2 (k)| can be calculated using Lemma 03 

Proof See Appendix 19. 1.21 ■ 

We now study the distance properties of these codes. We are particularly interested in the 
Chebyshev distance of ?’-regular multipermutations, for we can directly compare our results to the 
distance property of ST codes. Let y be a fixed multipermutation that may or may not be a 
codeword. Following the terminology used in m, we refer y as the “origin” multipermutation; we 
consider the Chebyshev distance from the fixed y to other codewords. We use t = (1, 2,... , m) as 
the initial vector. 


Proposition 2 Let r = (r, r, ... ,r) and define 


L d (U M (r,Z )) := \{X E U M (r, Z^d^tX,y) < d}\. 


Then, 


(") ( 2 dr + rrnl 

|<S*(/c)| 2 2dr n n (r\) m 


<E[A,(n M (r,2))] 

("T") [(2rfr + r)!]*fe 
~ |5 z (k)| (r!) m 


where the expectation is taken over all possible choices of Z. Using the same notation, 

^+( 2 )) ( 2 dr + r) n n\ 


. (nm — n\ 
1 2 


<E[L d (U M (r,£))\ 


|S e 0)| 2 2dr n n (r\) m 


,/nm-n\ | /n\, n 

^ (1 2 [(2dr + r)!]2dr+r 

~ \s e (i)\ (r!) m 

where the expectation is taken over all possible choices of £. 

Proof See Appendix 19.1.31 


(4.6) 


(4.7) 
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5 Channel model and LP decoding 


In the previous section we showed how to construct codes by placing linear constraints on multiper¬ 
mutation matrices. Recall that in Theorem[2]we characterized the convex hull of multipermutation 
matrices. We now leverage this characterization to develop two linear programming decoding prob¬ 
lems. By relaxing the ML decoding integer program, we first formulate a linear program decoding 
problem that is suitable for arbitrary memory less channels. The objective function of this LP is 
based on log-likelihood ratios, and is analogous to LP decoding of non-binary low-density parity- 
check (LDPC) codes, which is introduced by Flanagan et al. in [31] . If we apply this formulation 
to the AWGN channel, the resulting problem is analogous to the one developed in M- The second 
problem we introduce is not seen in the literature to the best of our knowledge, and can be applied 
to channels that are not memoryless. In this problem, we relax the minimum Chebyshev distance 
decoding problem to a linear program by introducing an auxiliary variable. 


5.1 LP decoding for memoryless channels 

We first focus on memoryless channels where £ is the channel output space. Since the initial vector 
t is assumed to contain distinct entries, the channel input space is S = {t\, ... , t m }. Without loss 
of generality, we assume that t\ < t% < ■ ■ ■ < t m . Let x be a codeword from an LP-decodable 
multipermutation code that is transmitted over a memoryless channel. Let y be the received word. 
Then, P^\s n (y\ x ) = n”=i P^\s{Vi\ x i)- Lor this channel model, we define a function 7 : £ 1 —M m , 

where 7 (y) is a length-m row vector defined by 7 fiy) = log ( \ n.s ) • Further, we let T(t/) = 

h(yi) T \...h(y n ) T ) T €R mn . 

Then, ML decoding can be written as 

x = avgmax x&AM{r A b ^ t) P^ s (y\x) 

n 

= argmax a . eAM(r . ij4ib| < >t) log P^\ s {yi\xi) 


2—1 


- * ( 1 argmiii Xe nM( PiAi6) <) 7 (Vi)X- 


c 


2=1 


argmin Xgn M (l , ij4A ^) T(y) vec(X 

where Xf is the i-th column of X and the transmitted codeword is x = tX. We recall that since X 
is a multipermutation matrix, Xf is a binary column vector with a single non-zero entry. Equality 
(a) comes from the fact that for each x £ A M (r, A, b, <3,t) there exists an X € II M (r, A , b , <l) 
such that x = tX. Further, since 7 (yi)Xf = — log {Pj:\s{yi\ti ))) the maximization problem can 
be transformed to a minimization problem. Equality (b) is simply a change of notation. 

For this problem, we can relax the integer constraints II M (r,A,b, <) to linear constraints 
IP M (r, A, b, <1). Then the LP decoding problem is 


minimize r(y)vec(W) 
subject to X £ P M (r, A, b, <]) 


(5.1) 


Theorem 3 The LP decoding problem (15.111 has an ML certificate. That is, whenever LP decod¬ 
ing ([5T| outputs an integral solution, it is the ML solution. 
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Proof Suppose that X is the solution of the LP decoding problem and is integral. Then X is a 
multipermutation matrix and Avec(X) < b. Since the relaxation P M (r, A,b, <) does not add or 
remove integral vertices, X £ II M (r, A,b, <). Since X attains the maximum of the ML decoding 
objective, it is the ML solution. ■ 


Proposition 3 LP decoding (15. 1|) is equivalent to ML decoding for LP-decodable multipermutation 
codes defined by fixed-at-zero constraints (cf. Definition 0). 


Proof As before for simplicity, we denote by P M (r,iJ) the code polytope of a multipermutation 
code subject to only fixed-at-zero constraints. In order to prove the proposition, it is sufficient to 
show that 

P M (r, Z) = conv(II M (r, Z)). (5.2) 

If (15.21) holds, then the relaxation does not have factional vertices and hence is tight. By the ML 
certificate (Theorem [3]), LP decoding is thus equivalent to ML decoding. Note that it is easy to 
verify by Definition [7] that 

P M (r, Z) D conv(II M (r, Z)). 

Hence, to complete the proof, we need to show that for all Z £ P M (r, Z), Z £ conv(n M (r, Z)). 

Since Z £ P M (r,iJ), we can express Z as a convex combination of multipermutation matrices 
in M(r). In other words, 


\M{r)\ 

z = ^2 OLhXh , 

h =1 


where Xh £ M.{r) are multipermutation matrices, and the set {ah} is a set of convex combination 
coefficients. We split the sum to two parts: 

Z = ^2 CHhX h + ^2 CHhX h . 

h:X h GU^(r,Z) h:X h ^{ r ,Z) 


Since Z t j = 0 for all ( i,j ) £ Z, au = 0 for all h such that Xjj h 7 ^ 0. This means that ah = 0 for 
all Xh n M (r,iJ), which implies that 

z = ^2 CthXh- 

h:X h e_U M {r,Z) 

This implies that Z £ conv(n M (r, Z)). ■ 


5.1.1 The AWGN channel 

In the AWGN channel, P^\s{y\U) 


(y-tj) 


\J C l'KO 


, where a 2 is the variance of the noise. 


Thus 


r(y) =4> ■ iixmn + <9((yi - D) 2 , • • •, ( 2/1 - t m ) 2 

v -v-' 

m 

■ ■ ■ (Un ~ t\) , ■ • • , (yn — tm) ) 5 

'-v-' 

m 
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where (f> = log ^ is a constant bias and 9 = ^7 > 0 is a common scaling constant. Then 


r(y) vec(X) = n<j> + 6 ( yf + ntf - 2u vec(W) j , 


\ i=l i =1 

where u = (yih,yit m ... y n ti, ■■■, y n tm)- Thus 


argmin x T(j/) vec(X) = argmax^ tr ((y T t)X). (5.3) 

We note that when the multiplicity vector is the all-ones vector, this formulation is the same 
as the LP decoding problem proposed in [16] . As a result, it is easy to restate the definition of 
pseudodistance and the error bound properties in |16] for LP-decodable multipermutation codes. 
We refer readers to m Section IV] for details. 


5.1.2 Discrete memoryless q -ary symmetric channel 


For this channel, the channel output space is the same as the input space. Namely, S = E = 
{ti ,..., t m }. The transition probabilities are given by 


Ts| s(y\x) = 


1 — p if y = x 

otherwise. 


Let e(y) be a row vector such that ei(y) = 0 if y ^ U and ei(y) = 1 if y = ti. Further, we denote 
by Y the matrix 

Y = [e(yi)' r |e(y 2 ) T | • • • | e{y n ) T }. 


Using this notation, 


Then, 


i{y) = log ( rn ) 1 + log 


P 


1 — p m — 1 


e(Vi)- 


T(y) vec(X) = tr |^log 
T log 


m — 1 

P 

1 


E t X 


P 


1 — p m — 1 


Y T X), 


where E is an m x n matrix with all entries equal to one. Note that tr (E T X) = n is a constant 
and log is a negative constant. Therefore 

argminjf r(y) vec(W) = argmaxjf tr (Y 1 X). (5-4) 

Note that this is equivalent to minimizing the Hamming distance between X and Y (cf. Lemma [2]). 
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5.2 LP decoding for the Chebyshev distance 

In this subsection we relax the problem of minimum Chebyshev distance decoding to a linear 
program. Minimum Chebyshev distance decoding can be written as the following optimization: 

minimize max \x^ — yi\ 

i 

subject to x £ A M (r, A, b, <,t). 

We introduce an auxiliary variable 5 and rewrite the problem as 

minimize 5 

subject to x £ A M (r, A, b, <1, £), 

— 8 < Xi — Vi < 5 for all i. 

Note that x = tX, where X £ n M (r, A, b, <). Therefore the problem can be reformulated as 

minimize 5 

subject to X £ II M (r, A, b, <), 

— S < tX — y < S, 

where S := (5,5,... , 6) is a length-n vector. To relax the problem to an LP, we replace II M (r, A,b, < 
) by P M (r, A , b , <) and obtain 


minimize 5 

subject to X £ P M (r, A, 6 , <), (5.5) 

— S < tX — y < 8. 

We make two remarks. First, as already mentioned, due to the relaxation, optimizer of the LP 
decoding problem may contain fractional entries. When this is the case, the decoding should be 
considered to be a decoding failure. However, it is not hard to observe that we can round the results 
in hope of finding ML solution. In this paper, we adopt a simple rounding heuristic to obtain the 
final decoding result. Let Xj = tj/where i(j ) = argrriax, Xjj for all j = 1,... ,n. Note that this 
step is important for LP decoding of Chebyshev distance since the solution to (15. 5 p is empirically 
observed to contain many fractional entries. 

Second, both LP decoding formulations can be solved using off-the-shelf solver such as the 
CVX toolbox [32]. However, generic LP solvers do not automatically exploit the structure of the 
LP decoding problem. In particular, the constraints in Theorem [2] can be described using factor 
graphs. We leverage this insight in Section [6.21 to develop an efficient decoding algorithm. 

6 Encoding and decoding algorithms for LP-decodable multiper¬ 
mutation codes 

To make our previous contributions more practical, in this section we focus on encoding and decod¬ 
ing algorithms primarily for ST codes. However, note that the decoding algorithm we develop can 
be generalized to decode all LP-decodable multipermutation codes. To the best of our knowledge, 
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there have been no encoding nor decoding algorithms developed for ST codes. We note that it 
is simple to derive a bounded distance decoder for ST codes by extending the decoding method 
proposed for Construction 1 in DU- However, we have not been successful in finding an encoding 
algorithm in the literature. Therefore, we first introduce a method that encodes ST codes, and 
then develop an efficient ADMM algorithm for the LP decoding problem (15.11) . 

6.1 An encoding algorithm for ST codes 

Formally, the encoding task for ST codes is as follows: Given a message from {0,..., \Cst\ — 1}, 
where Cst is the codebook and \Cst\ is the cardinality of the codebook, the algorithm should map 
the message index to the corresponding codeword of Cst- 

Before proceeding to the encoding algorithm, we first present mapping between the N = 
rTMrl) mu ltip ermu t a ti° ns and the integers from {0,..., N — 1}. Denote by “rankMPQ” the map 
from multipermutation to integer, and denote by “unrankMPQ” the inverse map. We only describe 
rankMPQ because it is straightforward, and because unrankMPQ can be deduced from rankMPQ. 
To the best of our knowledge, the only previous such mapping is an unpublished online posting 
due to Savara in [33]. The mapping in [33] ranks multipermutations in lexicographical order. Our 
algorithm produces a different ordering based on a novel mixed radix number system interpretation 
of multipermutations. 

We summarize rankMP(cc) in Algorithm |1] where a: is a multipermutation parameterized by 
multiplicity vector r. The intuition of the algorithm is as follows. Multipermutations can be 
considered as a mixed radix number system that has m “digits”. Each digit has a “base” that is 
the total number of induced combinations within the multipermutation (cf. Step [5]). The digits 
themselves can be calculated using [34] Theorem L], which maps combinations to integers (Step [I]). 
This process is demonstrated in Example 0J 


Algorithm 1 M = rankMP(cc) 

1: y <— x, let n y be the length of y. The 1-st base is always 1, i.e., b\ = 1. 

2: for all i = 1,..., m do 

3: Construct the vector (au,..., a r .) such that y aj +i = i for all j = 1..., r\. 

4: Calculate the z-th digit, a* <— Y^j =i ) ■ 

5: Calculate the (i + l)-th base, (” y ). 

6: Update y by deleting y Qj , Vj = 1 ..., r*. Update n y . 

7: end for 
8 : M = YZiaibi- 


As mentioned above, one can invert Algorithm [T] and obtain unrankMPQ. Note that once r is 
fixed, the bases Q are fixed. As a result, in order to invert unrankMPQ one should use modular 
arithmetic to determine a^, and then invert Step 2] again using modular arithmetic. We omit the 
details but demonstrate this process in Example IH 

Based on unrankMPQ, we now develop an algorithm that encodes ST codes. Let a: be a codeword. 
Then, by Definition [9l X{ = i mod d for all i. We split x into d sub-vectors, x^ l \ ..., x^ d \ such that 
X = (xi,Xd+i,X 2 d+i, ■■■), * < - 2) = (x 2 ,Xd+ 2 ,X 2 d+ 2 , • • ■ ), and so on. As a result, x is a r-regular 
multipermutation that (multi-)permutes the initial vector = (k, d + k, 2d + k,... ). This means 
that one can encode each x^ k \ k = 1 ,,d independently. We summarize this idea in Algorithm [2j 
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Algorithm 2 x = encodeST(M) 

1 : Consider the number system of radix where a = m/d. Convert the integer M to a vector 
of digits denoted as (li,, If). 

2 : for all k = 1 ,..., d do 

3: Do unrankMP(/fc) and obtain the r-regular multipermutation of length n/d. 

4: Apply this multipermutation to the initial vector t^ = (k,d + k,2d + k,...) to obtain x^ k \ 

5: end for 

6 : Combine x^ for all k = 1 ,... , d by merging entries. 


Example 4 We demonstrate Algorithm^ and{^ via examples. 

We first calculate rankMP(3, 3, 2,1,1,2). At the beginning, n y = 6 and b\ = 1. We obtain 
( 01 , 02 ) = (3,4), and thus ai = (^) + ( 2 ) = 9. Furthermore, 62 = ( 2 ) = 15. By deleting y\ and 
y§, we obtain the updated y' = (3,3, 2 , 2 ). Continuing the previous process, we get ( 0 ^, 0 ; 2 ) = (2,3) 
and a 2 = (^) + ( 2 ) =5. As a result, M = 9 • 1 + 5 • 15 = 84. In this example, 84 is expressed by 
two digits: 5is9i. Each digit belongs to a different base. 

The inverse algorithm, i.e., unrankMP/ 84/, requires knowledge of the multiplicity vector r = 
( 2 , 2 , 2 ). The bases are easy to determine: b\ = 1 and 62 = (!)). As a result, a\ = 9 because 
9 = 84 mod 62 . Further, 02 = 5 since < 2^2 + a-ibi = 84. We then recover a' 2 = 3, which is the 
largest integer such that (“ 2 ) <5. As a result, by solving (“/) = 5— ( 2 ), we obtain a\ =2. Repeating 
this process, we recover ( 01 , 02 ) = (3,4). Finally, using ( 01 , 02 ) and (a \, a' 2 ), which are vectors 
that describe the position of each value, we reconstruct the multipermutation as y = (3,3, 2,1,1, 2). 

Next, we use Algorithmic to encode a message. Consider the ST code with parameters r = 2 , 
d = 3, and m = 6 . This code is of cardinality 216. Suppose we would like to encode message 
137. By converting 137 to a vector of digits with base 6 , we first obtain (h,h,h) = (3,4,5), i.e., 
3 • 6 2 + 4-6 + 5 = 137. For each digit Ik, we use unrankMP(lk) to calculate the corresponding 
2-regular multipermutation of length 4. The residts in this step are 3 (1, 2, 2,1), 4 —>■ (2,1,2,1), 

and 5 (2,2,1,1). Furthermore, we obtain tC) = (1,4), tC) = (2,5), andt C) = (3,6). Combining 

these results, we obtain x ^ = (1,4,4,1), x^ = (5,2,5,2), and x ® = ( 6 , 6 ,3,3). Therefore the 
codeword for the message 137 is x = (1, 5, 6 ,4, 2, 6 ,4, 5, 3,1, 2, 3). 

6.2 LP decoding of linearly constrained multipermutation codes using the al¬ 
ternating direction method of multipliers (ADMM) 

In this subsection, we formulate the LP decoding problem (|5.1I> as an instance of ADMM. We 
first introduce a factor graph representation for codes constrained by fixed-at-zero and/or fixed-at- 
equality constraints. Then, we reformulated the decoding problem in the template of ADMM. 

6.2.1 Factor graph representation 

By Definition [3l a multipermutation matrix is a binary matrix satisfying m row sum constraints 
and n column sum constraints. This constraint satisfaction problem can be represented using mn 
variable nodes, m row-sum-check nodes, and n colunm-sum-check nodes. It can be drawn as a 
graph with circles representing variables nodes, squares representing row-sum-check nodes, and 
triangles representing column-sum-check nodes. 
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Figure 1: Factor graph of multipermutation matrices parameterized by r = (1,2,1). X 31 , 
X 13 , and X 24 are highlighted. 



Figure 2: Factor graph of multipermutation matrices parameterized by r = (1,2,1). In 
addition, X 31 = 0 and X 13 = X 24 . 


When additional constraints are enforced by Definition [5j the factor graph needs to be modified 
to reflect these added constraints. In particular, if fixed-at-zero constraints are used, then we 
delete all variable nodes that correspond to entries (i-j) £ Z. On the other hand, if fixed-at- 
equality constraints are used, then for each pair (. k, l ) £ £, we delete node (k, l) and reconnect 

edges originally connected to (k,l), so that they are connected to We illustrate this process 

in Example EJ 

Example 5 Consider multipermutation matrices parameterized by r = (1,2,1). In Figure QJ we 
draw the factor graph for the set of all multipermutation matrices. If, in addition, X 31 = 0 and 
Ad 3 = X 24 , then we delete node (3,1) and (2,4), and modify the edges originally connected to node 
(2,4) so that they are connected to node (1,3). The resulting graph is showed in Figured 

6.2.2 ADMM algorithm for the LP decoding problem (15.11) 

ADMM based LP decoding of binary LDPC codes is introduced in [24] . The ideas developed in [24] 
motivate us to develop an ADMM algorithm for decoding LP-decodable multipermutation codes. 
We first introduce some notation that is useful in deriving the algorithm, and then state the ADMM 
formulation. 

First, for compactness, we use 7 and x to represent T(y) and vec(A) respectively. Next, we 
introduce selection matrices P?, j = 1 ,...,n, such that P?x selects entries from x that partic- 
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ipate in the j-th column-sum-check. Similarly, let P [, i = 1 be selection matrices, each 

of which selects entries from x that participate in the corresponding row-sum-check. Finally, 
we denote by A m the standard m-simplex, i.e., the polytope defined by A m = {(xi,...,x m ) E 
= 1, and Xk > 0 for all A:}. Furthermore, let U n be defined as follows: = 

{(xi,..., x n ) E M n | Y2=l x k = r i an d 1 > Xfc > 0 for all k}. 

Equipped with the notation above, we rewrite ()5.1I) as 

... t 
minimize 7 x 

subject to P[x E Lj), Vz = 1,..., m, 

PjX E A m , Vj = 1,..., n, 

Ax < b. 

The next step in ADMM is the exploit the structure of the constraint Ax <\ b. For example, 
when the code is only constrained by fixed-at-zero and fixed-at-equality constraints, the constraints 
in (16.1|) can be translated to a modified factor graph as discussed in Section T 6 . 2. 11 This translation 
results in the following changes to (16.11) . First, the PP s and PP s should be changed to match the 
modified factor graph. Then, the parameters of L and A should be revised accordingly. Finally, 
“Avec(X) < 6 ” can be removed since the two steps above are sufficient to describe this type of 
constraint set. 

For simplicity, we use an ST code with parameters r, d, and m to illustrate the ADMM based 
decoding algorithm. Note that this formulation is easily extended to other codes with fixed-at-zero 
and fixed-at-equality constraints. Using the techniques developed in [21], we introduce replicas z c 
and z r to rewrite problem (16.11) as 

... t 

minimize 7 x 

subject to P[ x = z\. PjX = Zj, (6.2) 

Z i € Z j G A. m/d■ 

Then, the augmented Lagrangian used in ADMM is 

£^x,z r ,z c ,\,ri) = ~f T x 

+ ^2 X J ( P j x ~ z f) + f ^2 

j i 

+^ ( p i x - z i) + \ 

i i 

The ADMM algorithm minimizes C^x, z r ,z c , A, r j) in an iterative fashion similar to the one in [23], 
and hence we omit the details. Instead, we make four remarks. 

1. Although there are two symbols for replicas, z r and z c , one can concatenate z r and z c to 
form one vector. Doing so does not change the algorithm. 

2. The resulting x-update step (cf. [23]) for (16.21) is an average of the corresponding replicas plus 
a bias from 7 . 

3. The resulting 2 -update step (cf. [23]) requires two types of projections: projection onto A m 
and projection onto L^. The first projection can be solved in linear time using techniques 


\ pC i x ~ 


I P[x - zl 
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developed in [29]. The second projection can be solved in linear time using Algorithm [3] 
presented in Appendix flOl and which is developed based on ideas first proposed in [35]. The 
details of this linear time projection algorithm are presented in Appendix 1101 

4. Each ADMM iteration consists of an 0(mn) a:-update, m projections onto L”, n projections 
onto A m , and an 0(mn) A-update. Therefore, the computational complexity per iteration is 
0(mn). Due to the convergence result in [24, Proposition 1], the ADMM algorithm in this 
section has time complexity on the order of mn. 

We note that ADMM can also decode LP-decodable permutation codes, e.g., the pure invo¬ 
lution code introduced in EMj. Further, as demonstrated in [24], ADMM can decode long block 
length LDPC codes efficiently. Therefore a promising future work is to design long block length 
multipermutation codes with sparse structure. 

7 Numerical results 

In this section we present simulation results for ST codes with various parameter settings. We 
simulate the AWGN channel following the methodology adopted in [16] and [2TJ. We present 
results comparing five classes of decoders as follows: 

• LP decoding (15.11) (denoted by “LP AWGN”). Note that by Proposition [3] LP decoding is 
equivalent to ML decoding for ST codes. We also verify this claim empirically by implementing 
ML decoding via an exhaustive search (denoted by “ML AWGN”). 

• Minimum distance decoding (denoted by “Minimum distance”). We first rank the channel 
outputs to form a multipermutation that has the same multiplicity vector as the codebook. 
Then, we minimize the Chebyshev distance via an exhaustive search over all codewords. 

• Soft LP decoding of Chebyshev distance (denoted by “LP Chebyshev, soft”). We take the 
channel output as y in (]5.5I) and solve ([5.51) . Note that y is a real valued vector and is not 
necessarily a multipermutation. The distance we minimize is the infinity norm between two 
real-valued vectors. 

• Hard LP decoding of Chebyshev distance (denoted by “LP Chebyshev, hard”). We first rank 
the channel outputs to a multipermutation with the same multiplicity vector as the codebook. 
Then, we use this ranking as y for problem (15.51) . 

• Bounded distance decoding (denoted by “Bounded distance”). We first rank the channel 
outputs to a multipermutation with the same multiplicity vector as the codebook. Then, we 
search for the unique codeword within radius d/2 in the Chebyshev metric, where d is the 
minimum Chebyshev distance of the ST code. We declare an error if no codeword is found 
or, more than one codeword is found. As mentioned in Section [ 6 ] by extending the decoding 
method for Construction 1 introduced in HU, one can construct an efficient bounded distance 
decoder for ST codes that corrects (d — l)/2 errors when d is odd. 

We first simulate the ST code with parameters r = 2, d = 3, and m = 6. This means that each 
codeword is of length 12 and that bounded distance decoding can correct 1 error in the Chebyshev 
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Figure 3: Word-error-rate (WER) plotted as a function of signal-to-noise ratio (SNR) for 
the ST code with parameters r = 2, d = 3, and m = 6 . The codeword transmitted is 
(1,2,3,4,5, 6 ,1,2,3,4,5,6). 

metric. In Figure[3l we plot the word-error-rate (WER) as a function of signal-to-noise ratio (SNRjfl 
for the codeword (1, 2, 3,4, 5, 6 ,1, 2, 3, 4, 5, 6 ), where each data point is based on 100 word errors. 
As this is a nonlinear code, it is important to note that the WER performance is not necessarily 
the same across different codewords. Since the code is proven to correct some number of errors, our 
intention here is to demonstrate that LP decoding performs much better than bounded distance 
decoding. Nevertheless, we also simulated several other random codewords. We observed that, for 
the set of codewords we simulate, the WER does not vary significantly across different codewords 
(data not shown). 

We make the following observations. First, LP (ML) decoding achieve a significantly lower error 
rate than the other decoders. This suggests that soft decoding is better than hard decoding in terms 
of error rates. The difference between the WER plots of LP and ML decoding one may observe in 
Figure [3] is due to statistical fluctuation. Second, soft and hard LP decoding of Chebyshev distance 
suffer a 2 to 4 dB loss when compared to minimum distance decoding. In addition, these two 
decoders both achieve WER performance similar to that of bounded distance decoding. Because 
bounded distance decoding is much more computationally efficient, for this code, we prefer bounded 
distance decoding to LP decoding of Chebyshev distance. 

Next, we consider the ST code with parameters r = 3, d = 4, and m = 16. We plot WER as 
a function of SNR in Figure [H This code has a block length of 48, which is larger than the first 
code, making exhaustive search expensive to implement. As a result, we do not present results for 
the exhaust search based minimum distance decoding. However, we can implement ML decoding 
via ADMM. In ADMM, we set fi = 5.5 and the maximum number of iterations T max = 200. We 
observe that, unlike in Figured both soft and hard LP decoding of Chebyshev distance significantly 
outperform bounded distance decoding. However, their performance is about 2 to 3 dB worse than 
LP decoding (15.11) . From a computational complexity point of view, we note that although the 

®SNR is defined by 101og 10 , where a 2 is the variance of the Gaussian noise. 
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Figure 4: Word-error-rate (WER) plotted as a function of signal-to-noise ratio (SNR) for 
the ST code with parameter r = 3, d = 4, and m = 16. The codeword transmitted is 
(1 ,..., 16,1,..., 16,1,..., 16). 


maximum number of iterations is set to 200 , the average number of iterations observed was less 
than 50 at all SNRs simulated. We omit the details since the behavior is similar to that for ADMM 
decoding of LDPC codes. We refer the reader to [24] for an extensive discussion of implementation 
of ADMM for binary LDPC codes. 

8 Conclusions 

In this paper, we develop several fundamental tools of a framework for codes based on multiper¬ 
mutation. 

We first develop new theories: We propose representing multipermutations using binary matrices 
that we term multipermutation matrices. Using multipermutation matrices, we define LP-decodable 
multipermutation codes. In order to decode these codes using LP decoding we characterize the con¬ 
vex hull of multipermutation matrices, which is analogous to the Birkhoff polytope of permutation 
matrices. Using this result, we relax the code constraints and formulate two LP decoding problems. 
The first decoding problem minimizes the ML decoding objective. It can be applied to arbitrary 
memoryless channel. The second decoding problem minimizes the Chebyshev distance. 

To make these contributions useful in practice, we also develop new algorithms. We first develop 
a mixed radix number system interpretation for multipermutation. We use it to develop an efficient 
encoding algorithm for ST codes. Regarding decoding algorithms, we reformulate the LP decoding 
problem and use ADMM to solve it. The resulting ADMM formulation requires two projection 
subroutines that can be solved efficiently using techniques drawn from the literature. 

These contributions result in two major advantages for LP-decodable multipermutation codes. 
First, both LP decoding problems presented in this paper are computationally tractable. In par¬ 
ticular, the LP decoding problem for memoryless channels can be solved efficiently using ADMM. 
Second, our simulation results indicate that LP decoding can achieve significantly lower error rates 
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than hard decoding algorithms such as bounded distance decoding. 

The above two advantages lead to new research directions: The first is the design of good 
codes. ADMM LP decoding is simple and efficient for codes with fixed-at-zero and fixed-at-equality 
constraints. Consequently, code designs that use these two types of constraints can be decoded 
efficiently using LP decoding. In fact, we already know some codes that benefit from the algorithm, 
e.g., ST codes and pure involution codes. 

Second, although soft decoding can achieve lower error rates than hard decoding, soft decoding 
requires knowledge of the initial vector. However, in many situations, such knowledge is missing. As 
an example, in rank modulation, the decoder does not know the exact values of the initial vector 
determined at the cell programming stage. It hence cannot calculate the log-likelihood ratios 
of (15.11) . Therefore, it is important to develop decoding algorithms that can deal with uncertainty 
in the initial vector. We present some initial ideas along these lines in Appendix [TTJ 

Appendix 

9 Random coding results 

9.1 Proofs of random coding results 

9.1.1 Proof of Lemma |4j 

If Z is such that the fixed X € n M (r,Z), then the for all (i,j) € Z, X tJ = 0. Since there are n 
ones in X, Z has to be a subset of the remaining mn — n entries. In addition, since \Z\ = k, there 
are a total of ( mn ~ n ) number of possible choices. 

For the second result, if £ is such that the fixed X G n M (r, £), then the for all (*, j), (k, l ) G £, 

Xj 3 = Xyj. This means that either X^ = X^j = 0 or X t] = X^i = 1. The set of all possible 

fixed-at-equality constraints is then given by Therefore, this set has (( 2 

size-t subsets. 

9.1.2 Proof of Proposition |T] 


E[A(n M (r , Z))] = Y, P(2)A(U M (r,Z)) 


Z£S z {k) 


= E P ^) E HX€U M (r,Z)) 


Z£S z {k) X£M(r) 



where !(•) is the indicator function. By Lemma [4l 


Z^O~\K,) 



26 



Then 


EWn^))]"^ £ 

1 V XeM(r) 

|7M(r)| 


nm — n 

K 


In the same vein, we can prove that 

E[A(n M (r, £))] = 

9.1.3 Proof of Proposition [2] 


I«S*(k)I 

, /nm — n\ . /n\, 

(( 2 / + (- 2 ))|A4(r)| 

MOl 


E[L d (U M (r,Z))] 

= ^P(Z)L d (n M (r,Z)) 

Z£S z (k) 

= £r( 2 ) £i(x € II M (r,Z) and doo(tX,y) < d ) 

ZgS z (k) X£M(r) 

Ott £ £i(Ven M (r,z)), 


IS'WI 


|nm-nj 


XeM(r) Z£S z (k) 
daa^X ,y)<d 

Voo(r,n, d), 


where (r,n,d) is the number of elements being d-close to a vector in the Chebyshev metric. By 
Lemma 1-3 in [23], 14o (r,n,d) can be bounded by 


( 2 dr + r) n n\ ^ ^ ^ [( 2 dr + r)\] 2dr + r 

22dr n n^m - V oo(r,n,d) < 


(r!) m 

Therefore, we obtain (14.61) . Note that (|4.7[> can be obtained in a similar way and thus we omit the 
details. 


9.2 Numerical results on random coding ensemble 

In this appendix, we compare ST codes with the results obtained in Section 14.31 Recall that ST 
codes are codes with only fixed-at-zero constraints, and we denote the parameters by dsr, msT, 
and rsT (cf. Definition [9] for the role of each of these parameters). For each set of parameters, 
the cardinality A$t and distance dsT of the corresponding ST code is fixed. Furthermore, each set 
of parameters corresponds to a unique set of entries that are fixed to zero. We denote this set of 
entries by Z$t- In addition, we let kst = |^st|- 

We conduct two sets of numerical experiments. In the first set of experiments, we study the 
scaling of code sizes with the number of fixed-at-zero constraints. First, we compare ST codes 
with random codes by letting both have the same number of fixed-at-zero constraints. Next, we 
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conduct the reverse experiment by letting both have the same code cardinality. In other words, 
for each triple (dsT, fnsT, Fsr), we let kr be the largest number of fixed-at-zero constraints such 
that E[H(n M (r, Z))] < A$t- For both experiments, we scale cIst while fixing r$T and the ra¬ 
tio rnsTId st ■ Therefore, txist scales with dsr- Since neither experiment compares the distance 
properties, we conduct a second set of experiments. We first obtain kr in the same way as in the 
first set of experiments, and then use Proposition [2] to bound the average cardinality of radius-d r 
balls. In other words, we let the random coding ensemble have the same code cardinality as the 
ST code, and compare the minimum distances of the codes. We first study the scaling of ball size 
as a function of dsT ■ Finally, we fix a set of code parameters and obtain the spectrum of ball sizes 
with respect to the radius d r . 

In Figure [5] and 0 we plot results for the first set of experiments. In Figure 0, we define 
Csr{dsT ) := log(^4sr)/dsT and plot Cst as a function of dsj HI- Furthermore, since each dsr cor¬ 
responds to a KsTi we can generate the random coding ensemble using kst number of fixed-at-zero 
constraints. Then, we let Cr^st) '■= log(E[H(n M (r, Z))])/dsT and plot Cr as a function of dsr- 
It is easy to show that Cst = 18.9405 for r = 3 and m = 5 dsr, a result that is verified empirically 
by Figure 0 However, we observe from Figure 0 that Cr^st) decreases as dsr increases. Nev¬ 
ertheless, we can show that lim d ST ->oo Cr^st) > 13.31, which means that E[H(n M (r, Z))\ scales 
exponentially with dsr asymptotically. We note that even at small dsr, ST codes are much larger 
than the ensemble average. For example, when d$T = 5, number of codewords in the ST code is 
10 6 times larger than that of the ensemble average. Interestingly, in the reverse experiment, we 
observe from Figure 0 that the number of fixed-at-zero constraints for ST codes does not need to 
be significantly larger than the ensemble average in order to make both have the same number 
of codewords. However, changing the number of fixed at zero constraints does have a significant 
impact on the distance property of the code, as we demonstrate next. 



Figure 5: Cst '■= ^og(Asr) / dsr and Cr := log(E[T(n M {r,Z))])/dsT plotted as a function 
of dsr- Both have the same number of fixed-at-zero constraints, r = 3 and m = 5dsr- 

In Figure 0 and 0 we plot the results of the second set of experiment. In Figure 0 we scale dsr 
'We use the natural logarithm here. 
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Figure 6: kst and kr plotted as a function of dsr for random fixed-at-zero codes with 
parameters r = 3 and m = 5 dsr- 


while keeping r fixed at 3 and m at 5 dsr- We make the following remarks. First, recall that n st 
is larger than kr by a relatively small number. However, we observe in Figure [7] that the expected 
number of codewords within radius dsr of codes with fixed-at-zero constraints, E[L^ ST (n M (r, Z))\, 
is quite large. Note that for ST codes, there is no codeword that is dsr- close to any other codeword. 
This indicates that the distance property of ST codes is much better than the ensemble average. 
Second, we observe that the upper and lower bounds still have room for improvements. In particular, 
there is an exponentially increasing gap between the two. In Figure 0 we pick the ST code defined 
by parameters dsT = 5, r = 3, m = 30, and n = 90. Using this set of parameters, we find that 
random code design with kr = 2158 zeros achieves the same average codebook size as the ST code, 
i.e., around 2.26 • 10 . We observe in Figure 0 that the upper bound on the average ball size is 
less than 1 for d r < 3.5. This means that on average, there is less than 1 codeword within the 
radius-3.5 ball of any codeword. Although this result does not directly translate to the minimum 
Chebyshev distance of a code, we can still infer that the minimum distance is around 3, which is 
less than the minimum distance of the ST code, which is 5. 

10 Projection onto the l\ ball with box constraints 

In this appendix, we show a linear time projection algorithm onto U n . There are two key ideas in 
this algorithm. First, using Karush-Kuhn-Tucker (KKT) conditions, the projection problem can 
be transformed to a waterfilling type problem, wherein one needs to perform a binary search over 
2 n possible points. Second, although one can first sort these points and then perform a binary 
search, the sorting operation is 0(n log n) and is expensive. Instead, the binary search can be done 
based on linear time median finding algorithms (e.g., [ 36l Sec. 8.5]). These two key ideas are used 
in both [29] and 1 351 . 

Algorithm 0 in this section is modified from [35[ Algorithm 1] and hence is not new. We note 
that Barman et al. also worked on the problem and present their algorithm in 137j . However, 
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Figure 7: Upper and lower bound of E[Ld ST (n M (r,iJ))] plotted as a function of dsr for 
random fixed-at-zero codes with parameters r = 3 and m = 5 dsr- 



Figure 8: Upper and lower bound of ¥.[Ld ST (n M (r, Z))] plotted as a function of d for random 
fixed-at-zero codes with parameters dsr = 5, r = 3, m = 30, and n = 90. kr = 2158 to match 
the code size of the ST code defined by the same parameters. 
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their algorithm is based on sorting instead of median finding. Due to these previous works, we 
only briefly describe the derivations. Our goals in this section are to first correct some errors 
in m Algorithm 10 , and second to compare with the sorting based algorithm proposed in m- 
We refer readers to [37] for a nice waterfilling interpretation of the algorithm. Also note that 
switching from sorting to median finding requires tracking partial sums in each iteration, which is 
described in [35]| . 

Projection onto is equivalent to the following optimization problem. 


minimize 
subject to 

We introduce multipliers 9, rji and ly, 



0 < Xi < 1 for all i and ^2 x t = r. 
and write the Lagrangian of (110.111 


as 


( 10 . 1 ) 



- ^2 m ( 1 - Xi) - ^2 U i X i- 
i i 


Denote by x* an optimal solution and denote by 9 *, rj*, and u* the corresponding multipliers. The 
KKT conditions imply V C(x*) = 0, Vi — x* = 9* + r/* — i/*, rj *(1 — x*) = 0, and v*x* = 0. This 
means that for all 0 < x* < 1, Vi — x* = 9*. In other words, for a fixed 9 , the indices {1,... , n} are 
divided into three sets: 


• An active set 5,4 such that for i £ Sa , 0 < Vi — 9 < 1 and Xi = Vi — 9. 

• A clipped set Sc such that for i £ Sc, Vi — 9 > 1 and x* = 1. 

• A zero set Sz such that for * £ Sz, Vi — 9 < 0 and Xi = 0. 

When 9 varies in a range that does not change the three sets, JT Xi becomes a linear function 
with respect to 9. On the other hand, there are some values of 9 at which these sets change, 
which we term break points. The set of break points is easy to identify, it can be defined by 
B := U i= i n {{ v ii v i ~ !})• I n Algorithm [3J we perform a binary search over all break points, each 

corresponds to a triplet of active, clipped, and zero sets. In addition, the pivot for each binary 

search can be determined by a median finding algorithm. In Figure[9l we compare Algorithm [3] with 
the sorting based algorithm proposed in |37| . It is easy to observe that the linear time algorithm 
is significantly faster. 


Remarks on Algorithm [3] 

• Step [8] calculates the i\ norm of the current projection vector determined by 9 V (cf. [35j 
Eq. (11)]). 

s The geometry described in [35] corresponds to the inequality constraint i Xi — r - However, f35[ Algorithm 1] 
actually projects onto the geometry with the equality constraint. In addition, there are several minor errors with 1351 
Algorithm 1], 
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Figure 9: Execution time plotted as a function of projection dimension. Data collected on 
an Intel(R) 2.5GHz CPU. 


• Taking the median of the first few entries of B can be good enough in practice (e.g, first 50). 
This avoids taking the median of a huge set of numbers and which accelerate the algorithm 
for large values of 


11 Estimating the initial vector for rank modulation 

A key outcome of this paper is that soft decoding can outperform hard decoding and thus is 
promising in practice. However, it may be infeasible to obtain information required to set up the 
soft decoding problem as stated. For instance, this may be the situation in flash memory, where 
the initial vector is determined at the cell programming stage and is not fixed. Consequently, the 
decoder does not know the exact initial vector and hence cannot directly apply the soft decoding 
techniques developed in this paper. In this appendix, we initiate the study of these issues by 
presenting a model for estimating the initial vector given certain constraints on its uncertainty. 

Let C denote a codebook of multipermutation matrices. In rank modulation, the encoder 
encodes a message into a multipermutation ranking by injecting charge to memory cells. In this 
case, the initial vector t is determined after the memory cells are programmed to the desired 
ranking. Let x be the actual charge levels stored using rank modulation, then x = tX where 
X £ C. Due to the physical resolution limitations of flash memories, we assume that |L — tj | > A 
for all i , j € {1,... , m}. The value A is assumed known by the decoder. In addition, without loss 
of generality, we may assume that t\ < t 2 < ■ ■ ■ < t m . In summary, only C (which includes the 
values of r, m, and n), and A are known to the decoder. 

For simplicity, we assume that the noise is additive and the decoder observes channel output 

9 This approach uses a heuristic approximation of the true median. As a result, the total number of iterations may 
increase, but the overall execution time may decrease. 
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Algorithm 3 Project vector v onto IL{ 


Construct the set B := (J i=1 ni{ v ^ v i ~ !})• 

Construct the set V uncertam := {t>i,... ,v n }. 

^clip ^ 0 , Tl zero ^ O 5 ^ c lip 4 0 , S zero i 0 , •Sail «- Ya =1 v i- 

while |i3| >2 do 

d p median(£>). 

Use 0 P to partition y uncertam into active set S' A , clipped set S' c , and zero set S' z . 
<lip l^|, < lip <- sum(v^? certain ) - < lip . < ero <- \S'z\i 4ero <- sum(v a ? certain ). 
Evaluate 

^current = ^all ^zero ■Sclip 0p(d U c ]ip ^zero) 

— s clip - s zero + ^p( n clip + n zero) 


^uncertain \ 

V S' > 


_ o' _ ' 

°clip °zero 


9: if V current > then 

{Increase 0 p . Fix the current zero set.} 

10: ■‘’zero t Szero T 'Uero: ^zero t U zero T Tl zevo . 

11: Remove S' z entries from y uncertam . 

12: Update B by deleting elements less than 6 P . 

13: else if r curren t < r then 

{Decrease 8 p . Fix the current clipped set.} 

14: •‘’’dip t Sclip T S c ]ip> ^clip t n c ]ip T H c ]ip‘ 

15: Remove S' c entries from V uncertam . 

16: Update B by deleting elements greater than 0 p . 

17: else 

18: 0* 4— Op. {Success.} 

19: Determine the projection by applying 0* to the KKT conditions. Return. 

20: end if 

21: end while{£> has two elements.} 

22: Using maxS, evaluate Sa, Sc, and Sz for v. 

23: 0* 4— " c ^ Determine the projection by applying 0* to the KKT conditions. Re¬ 

turn. 








y = x + n. In this scenario, the decoder needs to solve the following ML decoding problem 

max P[y\tX] 

subject to \U — tj\ > A for all i and j, 

t is sorted in ascending order, 

X eC. 

This problem involves multiplying two variables (t and X) and thus cannot be written as a linear 
program. We explore two options to address this problem. 

11.1 Restricting the initial vector 

One natural idea for this problem is to enforce more constraints on t. For example, we can require 
that fj+i — ti = A, where A is a constant known by both the encoder and the decoder. As a 
result, t can be represented as t = t N + y, where t N = (A, 2A,..., (m — 1)A, mA) is a normalized 
vector and y is a constant chosen by the encoder but not known to the decoder. Consequently, 
y = tX + n = t N X + r/ + n. Therefore, the decoder can perform the following two-step decoding. 

1. Estimate y by fj = £(£"=i yj - E*=i )• 

2. Soft decoding (e.g. LP decoding) using y. 

Although this method is easy, one needs to develop a matching write process for such initial vectors. 
In particular, over-injections and ranking modifications need to be taken care of. In these two 
scenarios, one needs to increase all other cells so that they satisfy the condition t t +\ — L = A. On 
the other hand, this approach can reduce the number of rewrites of memories due to the restrictive 
choices of t. 

11.2 Turbo-equalization like decoding for initial vectors on grids 

We slightly relax the previous, restrictive, conditions and consider the case where all initial vectors 
take on values that are multiples of A. In other words, L = hi A, where hi € Z + and /q+i > fcj + 1. 
Furthermore, we assume the following largest cell condition: t m —t m _i = A. The reason behind this 
assumption is that the ranking of the cells stay the same as long as t m > t m - 1 + A. Thus, increasing 
t m during cell programming can reduce the number of rewrites before a block erasure. By leveraging 
these conditions, we propose the following iterative turbo-equalization decoding algorithm. 

(1) Decode using quantized ranking (by LP or bounded distance decoding), round all fractional 
solutions, and obtain X . 

(2) Let t* be the solution of the following problem. 

max P[y\tX] 

subject to ti -|_i — ti > A for alii = 1,..., m — 2, (11.1) 

tm ~ t m - 1 = A, and t\ > 0. 

(3) Round t* to the nearest multiples of A. Denote it by t. 
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Figure 10: Word-error-rate (WER) of turbo-equalization decoding plotted as a function of 
signal-to-noise ratio (SNR) for the ST code defined by parameter r = 3, d = 4, and m = 16. 

The codeword transmitted is (1,..., 16,1,..., 16,1,..., 16). 

(4) Use t to perform soft decoding and update the estimate X. 

(5) Repeat Step 2 to 4, each time with the latest estimates. 

For the AWGN channel, Step (2) can be formulated as a quadratic program and is solvable 
using off-the-shelf solvers. In Figure flOl we simulate the same code that we use in Figured! The 
true initial vector is (1,2,..., 16). In addition, A = 1. In this experiment, we only use LP decoding 
of Chebyshev distance. The difference among the curves is the input to each decoder. For turbo¬ 
equalization decoding, we use hard LP decoding of Chebyshev distance based on quantized ranking, 
followed by one or two turbo iterations. In each turbo iteration, we first estimate the initial vector 
using (lll.ip , and then decode using soft LP decoding of Chebyshev distance based on the estimated 
initial vector. Note that the turbo-equalization decoder is not provided with the true initial vector. 
The other two curves are baseline curves from Figured! 

We observe that the turbo-equalization decoding technique performs as well as when the true 
initial vector is provided. In addition, one turbo iteration is sufficient for the case simulated in 
Figure [TO! Further, we observe that using erroneous initial vectors in soft LP decoding of Chebyshev 
distance almost always yields incorrect decoding results (data not shown). On the other hand, we 
also observe that the initial vector estimation is not always accurate. 

We believe that the algorithm proposed is promising, but there are many open questions. Two 
important ones are first how to analyze the error performance of turbo-equalization decoding and 
second what classes of codes that benefit from this scheme. 


References 

[1] D. Slepian, “Permutation modulation,” Proceedings of the IEEE , vol. 53, no. 3, pp. 228-236, 
Mar. 1965. 


35 














[2] W. Chu, C. J. Colbourn, and P. Dukes, “Constructions for permutation codes in powerline 
communications,” Des. Codes Cryptography , vol. 32, no. 1-3, pp. 51-64, May 2004. 

[3] A. Jiang, R. Mateescu, M. Schwartz, and J. Brack, “Rank modulation for flash memories,” in 
IEEE Int. Symp. Inf. Theory (ISIT), Toronto, Canada, July 2008, pp. 1731-1735. 

[4] C. Colbourn, T. Klpve, and A. Ling, “Permutation arrays for powerline communication and 
mutually orthogonal Latin squares,” IEEE Trans. Inf. Theory , vol. 50, no. 6, pp. 1289-1291, 
June 2004. 

[5] A. Jiang, M. Schwartz, and J. Brack, “Correcting charge-constrained errors in the rank- 
modulation scheme,” IEEE Trans. Inf. Theory , vol. 56, no. 5, pp. 2112-2120, May 2010. 

[6] A. Barg and A. Mazumdar, “Codes in permutations and error correction for rank modulation,” 
IEEE Trans. Inf. Theory , vol. 56, no. 7, pp. 3158-3165, July 2010. 

[7] Y. Yehezkeally and M. Schwartz, “Snake-in-the-box codes for rank modulation,” IEEE Trans. 
Inf. Theory , vol. 58, no. 8, pp. 5471-5483, Aug. 2012. 

[8] A. Mazumdar, A. Barg, and G. Zernor, “Constructions of rank modulation codes,” IEEE 
Trans. Inf. Theory , vol. 59, no. 2, pp. 1018-1029, Feb. 2013. 

[9] H. Zhou, M. Schwartz, A. Jiang, and J. Brack, “Systematic error-correcting codes for rank 
modulation,” IEEE Trans. Inf. Theory , vol. 61, no. 1, pp. 17-32, Jan. 2015. 

[10] S. Buzaglo and T. Etzion, “Perfect permutation codes with the Kendall’s r-metric,” in IEEE 
Int. Symp. Inf. Theory (ISIT), Honolulu, HI, USA, June 2014, pp. 2391-2395. 

[11] I. Tamo and M. Schwartz, “Correcting limited-magnitude errors in the rank-modulation 
scheme,” IEEE Trans. Inf. Theory , vol. 56, no. 6, pp. 2551-2560, June 2010. 

[12] T. Klpve, T.-T. Lin, S.-C. Tsai, and W.-G. Tzeng, “Permutation arrays under the Chebyshev 
distance,” IEEE Trans. Inf. Theory, vol. 56, no. 6, pp. 2611-2617, June 2010. 

[13] M. Schwartz and I. Tamo, “Optimal permutation anticodes with the infinity norm via perma¬ 
nents of (0, l)-matrices,” J. of Combin. Theory, Ser. A, vol. 118, no. 6, pp. 1761-1774, Aug. 
2011. 

[14] I. Tamo and M. Schwartz, “On the labeling problem of permutation group codes under the 
infinity metric,” IEEE Trans. Inf. Theory, vol. 58, no. 10, pp. 6595-6604, Oct. 2012. 

[15] F. Farnoud, V. Skachek, and O. Milenkovic, “Error-correction in flash memories via codes in 
the Ularn metric,” IEEE Trans. Inf. Theory, vol. 59, no. 5, pp. 3003-3020, May 2013. 

[16] T. Wadayama and M. Hagiwara, “LP-decodable permutation codes based on linearly con¬ 
strained permutation matrices,” IEEE Trans. Inf. Theory, vol. 58, no. 8, pp. 5454-5470, Aug. 
2012. 

[17] W. Chu, C. J. Colbourn, and P. Dukes, “On constant composition codes,” Discrete Applied 
Mathematics, vol. 154, no. 6, pp. 912-929, Apr. 2006. 


36 



[18] S. Huczynska and G. L. Mullen, “Frequency permutation arrays,” J. Combin. Designs , vol. 14, 
no. 6, pp. 463-478, Jan. 2006. 

[19] S. Buzaglo, E. Yaakobi, T. Etzion, and J. Bruck, “Error-correcting codes for multipermuta¬ 
tions,” in IEEE Int. Symp. Inf. Theory (ISIT), Istanbul, Turkey, July 2013, pp. 724-728. 

[20] F. Farnoud and O. Milenkovic, “Multipermutation codes in the Ulam metric for nonvolatile 
memories,” IEEE J. Select. Areas Commun., vol. 32, no. 5, pp. 919-932, May 2014. 

[21] F. Zhang, H. Pfister, and A. Jiang, “LDPC codes for rank modulation in flash memories,” in 
IEEE Int. Symp. Inf. Theory (ISIT), Austin, TX, USA, June 2010, pp. 859-863. 

[22] A. W. Marshall, I. Olkin, and B. C. Arnold, Inequalities: theory of majorization and its 
applications. Springer, 2009. 

[23] M.-Z. Shieh and S.-C. Tsai, “Decoding frequency permutation arrays under Chebyshev dis¬ 
tance,” IEEE Trans. Inf. Theory , vol. 56, no. 11, pp. 5730-5737, Nov. 2010. 

[24] S. Barman, X. Liu, S. C. Draper, and B. Recht, “Decomposition methods for large scale LP 
decoding,” IEEE Trans. Inf. Theory , vol. 59, no. 12, pp. 7870-7886, Dec. 2013. 

[25] X. Liu and S. C. Draper, “The ADMM penalized decoder for LDPC codes,” ArXiv preprint 
1409.5140, Sept. 2014. 

[26] —, “ADMM LP decoding of non-binary LDPC codes in F 2 ™,” ArXiv preprint 1409.5141, 
Sept. 2014. 

[27] A. Yubt, “On efficient linear programming decoding of HDPC codes,” Master’s thesis, Tel-Aviv 
University, Tel-Aviv, Israel, 2014. 

[28] X. Liu, “ADMM decoding of LDPC and multipermutation codes: from geometries to algo¬ 
rithms,” Ph.D. dissertation, University of Wisconsin-Madison, 2015. 

[29] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra, “Efficient projections onto the l\-bal\ 
for learning in high dimensions,” in Proc. Int. Conf. on Machine Learning (ICML), Helsinki, 
Finland, July 2008, pp. 272-279. 

[30] X. Liu and S. C. Draper, “LP-decodable multipermutation codes,” in Proc. Allerton Conf. on 
Comm., Control and Computing, Monticello, IL, USA, Oct. 2014, pp. 828-835. 

[31] M. Flanagan, V. Skachek, E. Byrne, and M. Greferath, “Linear-programming decoding of 
nonbinary linear codes,” IEEE Trans. Inf. Theory, vol. 55, no. 9, pp. 4134-4154, Sept. 2009. 

[32] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming, version 
2.1,” http://cvxr.com/cvx, Mar. 2014. 

[33] P. Savara, “Ranking and unranking permutations of multiset,” Oct. 2010, (available online 
at http://zamboch.blogspot.com/2007/10/ranking-and-unranking-permutations-of.html as of 
Jan. 2015). 

[34] D. E. Knuth, The Art of Computer Programming, Volume 4, Fascicle 3: Generating All 
Combinations and Partitions. Addison-Wesley Professional, 2005. 


37 




[35] M. Gupta, S. Kumar, and J. Xiao, U L\ projections with box constraints,” ArXiv preprint 
IOIO.OI 4 I, Oct. 2010. 

[36] W. H. Press, Numerical recipes 3rd edition: The art of scientific computing. Cambridge 
university press, 2007. 

[37] S. Barman, X. Liu, S. C. Draper, and B. Recht, “Decomposition methods for large scale LP 
decoding,” in Proc. Allerton Conf. on Comm., Control and Computing , Monticello, IL, USA, 
Sept. 2011, pp. 253-260. 


38 



